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© In a digital computer, a circular queue of regis- 
ters in a register file are allocated as temporary local 
storage for procedures rather than using the known 
caller/callee save convention in order to minimize 
main memory references. A called procedure dy- 
namically allocates local registers as needed without 
regard to registers used by the caller of the proce- 
dure or by any callee of the procedure, whereby 
register allocation is not restricted by any predeter- 
mined window size. Local registers (62), including 
parameter passing registers (64), are allocated in the 
called procedure, rather than a priori at compile 
time, by adjusting register stack pointer values (TOL, 
OTOL). Only the number of registers actually re- 
quired by the procedure need by allocated. Option- 
ally, rotating registers (74) may be allocated among 
the local registers (62). Stack pointer values are 
stored in one of the parameter passing registers (64) 
when a procedure is called. Hardware register file 
access circuitry maps virtual register numbers used 
by the procedures into the hardware register file. 
Upon return from a procedure, registers are deal- 
located by adjusting the register stack pointers to 
the values stored when the procedure was called. 
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BACKGROUND 

von Neumann architecture digital computers 
have a register set for. holding various values dur- 
ing operation. The size of the register set may 
vary. All von Neumann machines have at least a 
program counter (PC). Generally, there are also 
several registers for holding operands and results 
("operational registers"). RISC (reduced instruction 
set computer) machines generally have only regis- 
ter-to-register instructions (as distinguished from 
instructions that directly access memory) except 
for LOAD and STORE instructions, which read from 
memory or write to memory but do not operate on 
the data. They tend to have larger register sets, 
numbering for example 32 or more registers. Reg- 
isters are used for holding intermediate results, 
address indexing, and passing data "(parameters) 
between calling and called procedures such as 
subroutines. Some processors have floating-point 
registers in addition to general registers. CISC ar- 
chitectures usually have evaluation stacks, thus 
providing for O-address operations in which the 
operands are implicit. RISC architectures usually 
do not have evaluation stacks. The compiler nor- 
mally keeps a stack in memory on RISC architec- 
tures, primarily for parameter passing and register 
spills rather than for computation. 

In most architectures, the overhead of saving 
and restoring registers on procedure calls is bur- 
densome; it can account for 5% to 40% of main 
memory references. To reduce this overhead, it is 
known to provide several banks of registers, with a 
new bank of registers allocated to each called 
procedure. This technique has been termed regis- 
ter windows. See J. Hennessy and D. Patterson, 
Computer Architecture - a Quantitative Approach 
(1990), Section 8.7. Using register windows, the 
register banks or "windows" are overlapped to 
provide a common area for passing parameters. 
Registers are divided into global registers, which 
do not change on a procedure call, and local regis- 
ters which do change. A block of registers is saved 
to memory when the buffer is full and followed by a 
call (window overflow): or when it is empty, and 
followed by a return (window underflow). 

Register windows are implemented currently in 
Sun Microsystems SPARC® architecture, and are 
further explained in U.S. , Pat. No. 5,159,680 which 
shows operating register windows in a ring configu- 
ration. U.S. Pat. No. 5,233.691 discloses a register 
window system for reducing the need for overflow- 
write by prewriting registers to memory during 
times without bus contention. A high performance 
register file that implements overlapping windows 
is disclosed in U.S. Pat. No. 5.226.142. U.S. Pat. 
No. 5,226,128; U.S. Pat. No. 5,083,267; and U.S. 
Pat. No. 5,036,454 discJose use of rotating regis- 



ters for loops. 

One of the problems with prior art architectures 
such as register windows is that the size of a bank 
of registers (i.e a register window) is fixed; it cannot 
5 vary from procedure to procedure. As a result, not 
all registers in a local register area allocated to a 
procedure are actually used by that procedure, and 
conversely, in many cases, procedures are not 
allocated enough registers as required by the pro- 
io cedures. This causes performance degradation be- 
cause memory references are not 'optimal. 

Another limitation of register windows is that 
the number of overlapping registers also is fixed. 
Again, that number may well exceed the number of 
15 parameters actually necessary for the called proce- 
dure, again reducing the density of register usage. 
Moreover, this fixed overlap imposes an arbitrary 
limit on the number of passed parameters in con- 
nection with a single procedure call. 
20 Rotating register space is used by a software 

pipelined loop in order to begin to prepare data 
several cycles before an operation using it is in- 
voked and make the data available just at the time 
tho data is required. Tho number of registers ro- 
25 quired in the software pipelined loop varies accord- 
ing to the characteristics of the loop. If the size of 
rotating register space is fixed, as in the prior art. 
one must allocate ample space e.g. 64 registers, to 
cover most loops. There are, however, many small 
30 loops which require 16 or fewer registers and many 
large loops which requires more than 64 registers. 
For the small loops, many registers are allocated 
and freed unnecessarily, and for the larger loops, 
processing speed is slowed down because of the 
35 shortage of registers. 

In view of the foregoing introduction, what is 
needed is a more efficient method of allocating and 
deallocating registers that is not confined by the 
fixed group size of prior art register windows. 

40 

SUMMARY OF THE INVENTION 

In view of the foregoing background, an object 
of the present invention is to improve average 
45 speed of procedure call and return operations in a 
computer. 

Another object is to minimize the number of 
register saves and restores in operation of a pro- 
cessor. 

so Another object is to efficiently allocate tem- 

porary local storage needed by called procedures. 

A further object of the invention is to allocate 
sufficient register storage without regard to storage 
used by the caller of a routine or by any callee of 
55 the routine. 

A further object is to improve efficiency by 
avoiding saving and restoring registers that are not 
being used currently. 
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Yet another object is to reduce overhead asso- 
ciated with allocating and saving a limited number 
of registers available in a processor. 

A still further object is to dynamically partition 
a register set to meet called procedure require- s 
ments, including allowing the full range of registers 
to be utilized by any procedure that needs it. 

Another object is to allocate to a procedure 
exactly the number of rotating registers required by 
it for a software pipelined loop. to 

Another object is to increase the density of 
register usage. 

Yet another object is to effect register saving 
and restoring without compiler intervention. 

One aspect of the present invention partitions is 
the physical registers into static registers and stack 
registers. It permits the stack registers to be ad- . 
dressed indirectly through base or relocation regis- 
ters that point into the stack. Instead of requiring 
procedures to save registers at procedure calls, 20 
and restore saved registers at procedure returns, 
the present method permits every procedure to 
allocate from the stack (and deallocate to the stack 
on the return) a set of registers that is independent 
of its caller. If such allocation does not result in a 25 
stack overflow or underflow, no memory accesses 
aro required. 

If hardware implements a sufficiently largo 
stack, then the immediate availability of local regis- 
ters to a called procedure, the availability of the 30 
memory pipes that would otherwise have to save 
and restore registers, and the improved cache be- 
havior resulting from tho reduction in memory traf- 
fic is expected to improve system throughput, ro- 
source utilization, and execulion time of programs. ar> 

The exact number of registers requested (and 
presumably required) by a procedure are allocated 
to that procedure. More specfically, according to 
the invention, each procedure and each loop are 
allocated exactly a required number of registers to 40 
fit to their characteristics. Thus, no registers are 
either allocated unnecessarily nor saved/restored 
unnecessarily. This feature leads to the efficient 
use of registers and shortening execution time. 

Thus the invention includes a method of dy- 45 
namically allocating registers I to procedures in a 
digital computer without compiler intervention. The 
method includes the steps of: defining a logical 
register stack comprising a plurality of stack regis- 
ters; initializing a local relocation term (called "Irer) so 
to define an offset for mapping the logical register 
stack into the physical register set of the computer; 
allocating to a first procedure an arbitrary number 
of stack registers specified by the first procedure 
as local registers by initializing a first stack pointer 55 
value (TOL) so as to delimit the local registers in 
the logical register stack; and in connection with a 
register access operation during execution of the 
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first procedure, mapping each local register into 
the physical register set responsive to the local 
relocation term. 

In preparation for calling a second procedure, 
the method calls for storing the first stack pointer 
value (TOL) as a second stack pointer value called 
"old TOL" (OTOL); allocating to the first procedure 
a number of additional stack registers specified by 
the first procedure as parameter passing registers 
by incrementing the first stack pointer value (TOL) 
so as to include the parameter passing registers; 
and storing selected parameters in the allocated 
parameter passing registers for reforence by a 
called procedure. We also map the parameter 
passing registers into the physical register sot ro- 
sponsive the local relocation term. 

Upon calling the second procedure, the meth- 
od further includes allocating to the second proce- 
dure an initial local register space that includes the 
first procedure parameter passing registers. This 
step makes the parameters stored in those regis- 
ters available to the second procedure without a 
memory reference. A number of additional stack 
registers as required by the second procedure are 
allocated to the second procedure as local regis- 
ters by incrementing the stack pointer value. This 
allocation is done without first saving tho first pro- 
cedure's local registers' contents to momory. Upon 
returning from the second procedure, the inventive 
method includes deallocating the local registers by 
decrementing the stack pointer value by the num- 
ber of local registers. Thus the method includes 
calling and roturning from the second procedure 
without saving and restoring local register contents. 

Another aspect of the present invention is a 
register file port access circuit for providing a phys- 
ical address to a register file port. The circuit 
receives a virtual address; compares it to the static 
register address space and indicates whether or 
not the virtual address is within the static register 
address space. If so, the circuit couples the virtual 
address to the register file port as a first physical 
address for accessing a corresponding register. 
The circuit further includes circuitry for combining 
the virtual address with a local relocation term to 
form, a second physical address; and means for 
coupling the second physical address to the regis- 
ter file port as an address if the virtual address is 
not within the static register address space. The 
access circuit is arranged for adding the local re- 
location term to the virtual address modulo a pre- 
determined total number of physical registers. 

The foregoing and other objects, features and 
advantages of the invention will become more 
readily apparent from the following detailed de- 
scription of a preferred embodiment which pro- 
ceeds with reference to the drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a conceptual diagram illustrating reg- 
ister windows. 

FIG. 2 is a logical address space model for a 
set of registers. 

FIGS. 3A-3I are a series of logical address 
space models for the register set of FIG. 2 illustrat- 
ing operation of the present invention. 

FIGS. 4A-4I are a series of physical address 
space models corresponding to the logical address 
space models of FIGS. 3A-3I, respectively. 

FIG. 5 is a hardware block diagram illustrating 
register filo port access circuitry for implementing 
one embodiment of the present invention. 

FIG. 6 is a hardware block diagram illustrating 
register file port access circuitry for implementing 
an alternative embodiment of the present invention 
that includes rotating registers in the register stack. 

FIG. 7 is a hardware block diagram illustrating 
one example of a register file system for imple- 
menting the present invention. 

DETAILED DESCRIPTION OF A PREFERRED EM- 
BODIMENT 

Fig. 1 is a conceptual diagram that illustrates a 
prior art method of allocating registers known as 
register windows. In the following description, refer- 
ence numbers are used to refer to portions of the 
address space model as indicated in the drawing. 
Drawing reference numbers should not be con- 
fused with register numbers. We will use lower 
case r to indicate a physical register number and 
upper case R to indicated a logical or virtual regis- 
ter stack number. The abbreviation "VR" means 
virtual register and "PR" means physical register. 

In FIG. 1, a first window number n-1 is al- 
located global registers rO through r9 and local 
registers R10 through R31. When a new procedure 
is called, an additional bank of registers is allocated 
to it. Referring to window number n, registers rO 
through r9 remain the same since they are global. 
Six registers overlap the preceding window, with 
RtO to R15 of the caller's registers becoming R31 
to R26 after the call. Ten registers are not included 
in the windows, so there are sixteen (32 - 10-6) 
unique registers per window even though each 
procedure sees 32 registers at a time. The overlap- 
ping registers are used for passing parameters. 
Similarly, in window number n + 1 , R10 to R15 of 
the caller's registers (window n) become R31 to 
R26 after the call, again providing six overlapping 
registers. As mentioned in the Background Section, 
the register windows scheme with its fixed size 
partitions creates registers that are saved even 
when not used. 
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The present invention minimizes the number of 
registers that are saved and restored at procedure 
interfaces by allowing individual procedures to al- 
locate and deallocate an arbitrary number of regis- 
5 ters (limited by the size of the logical register 
stack) from a pool of physical registers as needed. 
Registers in this pool are accessed through indirec- 
tion or relocation pointers further described below. 
The register stacks can be viewed as the tops of 

10 software stacks that are available in registers, and 
therefore easily and quickly accessed. Thus, "dy- 
namic allocation" of registers is controlled by 
called procedures themselves rather than predeter- 
mined by the compiler. 

is In one example of an embodiment of the inven- 

tion, there may be 128 fixed-point and 128 floating- 
point registers. A typical hardware register file may 
include 64 static registers and 64 rotating registers. 
Register files may be implemented as stand alone 

20 integrated circuits or "on-board" a processor de- 
vice. Details of implementing the physical register 
files themselves are known and not germane here. 
The present invention is equally applicable to either 
or both fixed-point and floating-point registers. It is 

25 described in the context of fixed-point registers for 
illustration. The following description uses these 
terms: 

Physical Registers (PR): The physical regis- 
ters visible to the- system architecture. The actual 

30 number of physical registers is merely a design 
choice. It is assumed that the physical registers are 
implemented in register files. 

Virtual Registers (VR): The register number 
as specified in an instruction. The VR number may 

35 be the same as the PR number, or the VR number 
may be modified as described below to determine 
the address of the corresponding PR. 

Static Registers: Also called Global Registers, 
these are registers that do not participate in any 

40 stacking, or rotation. In other words these registers 
are accessed directly using the register address as 
provided in a syllable without any indirection. In the 
embodiment described below, VR addresses O to 
31 are used unmodified to access PRs O to 31, 

45 which are the static registers. 

Rotating Registers: These are registers al- 
located by a procedure to participate in software 
pipelining and are accessed as an offset from the 
RRB [rotating register base). Any procedure may 

so access an arbitrary number of rotating registers at 
any instant, limited approximately by the number of 
physical registers. The rotating registers are ad- 
dressed as a ring. 

Stack Registers: The pool of registers that 

55 participate in stacking and rotation. Through the 
use of slack pointers, which specify the base or 
indirection for accessing the stack registers, the 
stack register pool is managed as a ring. (Rotating 

4 
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Registers thus are managed as a ring within a 
.ring.) In other words, if VR(i) corresponds to the 
physically implemented stack register with the larg- 
est address, then VR (i + 1) corresponds to the 
stack register with the smallest address. In the 5 
example described below, there are 96 stack regis- 
ters (R32-R1 27). The rotating registers are drawn 
from the pool of stack registers. VRs 32 to 127 are 
modified by base pointers to determine the cor- 
responding PRs {physical registers). 10 

Local Registers: The stack registers that are 
accessible to the current procedure. 

In addition to the Rotating Register Base (RRB) 
mentioned above, the preferred embodiment main- 
tains the following additional indirection or base 15 
register values that point into the stack register file. 
(Preferably, the processor has two copies of each 
of the following base registers, one for the fixed- 
point stack and another for the floating-point 
stack ) 20 

BOV (Rntvtm nf Valid) The stack pointer mark- 
ing the doom of thr- software stack that is acces- 
sible through thf register slack. Allocating across 
BOV results m 0 stack ovedlow, and deallocating 
across BOV 'i:r,u«r m a stack underflow, further pr> 
explained t*:-k»* 

BOL (BoTum of L >:al) The stack marker that 
bounds at on* end the stack registers that are 
accessible to the current procedure. All the current 
procedure stack registers are accessed relative to 30 
BOL. In general VR 1 accesses PR j, where P is 
the total number of stack registers and i and j are 
related by: j = 1 if < 32; j = [(BOL + i - 32) mod 
P] + 32. il 1 > = 3r (This assumes PR0-PR31 are 
static registers ) In ir>e preferred embodiment, BOL 35 
defaults to the first stack register (PR32). 

TOL (Top of Local) The stack marker that 
bounds at the other end the registers that are 
accessible to the current procedure. An attempt by 
a procedure to access a register that is not within 40 
its local area. 1 o a register that is out of the BOL- 
TOL bounas. will result in an exception. 

OTOL (Od Top of Local): The value of TOL 
prior to the allocation of any parameter registers. 
The registers between TOL j and OTOL are the 45 
allocated parameter registers. 

BOR (Bottom ol rotating): The stack marker 
that bounds at one end the stack registers that 
participate in rotation. 

TOR (Top of rotating): The stack marker that 50 
bounds at the other end the stack registers that 
participate in rotation. 

In general, the register stack benefit applies 
only to the stack registers, and the compiler will 
have lo continue to adopt a caller/callee 55 
save/restore strategy lor the static registers. All 
arguments and explanations apply equally to the 
fixed-point stack and the floating-point stack. Each 



stack has its own set of base registers. The fixed 
register file and the floating-point register file are 
each controlled separately. We will describe only 
the fixed-point stack in detail to illustrate the inven- 
tion. 

Local, parameter, and rotating registers are al- 
located or deallocated by executing a newly-de- 
fined operation, alloc. Allocation and deallocation 
of registers modifies TOL and OTOL if local regis- 
ters are allocated/ deallocated. TOL and OTOL are 
incremented/ decremented by the number of local 
registers being allocated/ deallocated. Allocation/ 
deallocation of parameter registers modifies TOL. 
TOL is incremented/ decremented by the number 
of parameter registers being allocated/ deallocated, 
as illustrated in FIG. 3 described below. Allocation 
and deallocation of registers may also affect BOV 
in the case of stack overflow/underflow. 

Allocation/ deallocation of rotating registers 
modifies BOR. TOR, TOL, and OTOL. If rotating 
registers are being allocated. BOR is set to TOL. 
TOL, OTOL. and TOR are set to the sum of TOL 
plus the number of rotating registers being al- 
located. The converse modification results from 
deallocating rotating registers. The mechanism de- 
scribed pormils variation of the number of physical 
registers, for example across machine models, 
without affecting programs. 

On a procedure call (i.e., an execution of 
branch-and-link), the current state of the base reg- 
isters is stored in parameter register 0. Thus the 
compiler must allocate one additional parameter 
register than the number of parameters that are 
passed to/from the called procedure. Additionally 
BOL is set to OTOL, and OTOL is set to TOL. On a 
procedure return, the base registers are reset from 
the values stored in parameter register 0. Proce- 
dure return includes deallocation of the local and 
rotating registers of the called procedure, and may 
thus incur a stack underflow. 

The stack is said to overflow on an allocation 
when the TOL attempts to cross over BOV. Re- 
member that all the arithmetic is performed modulo 
the number of registers physically implemented in 
the stack. Or, a modulo-plus function may be used 
to skip over the fixed register address space. Simi- 
larly the stack is said to underflow on a deal- 
location when the BOL attempts to cross over BOV. 
The occurrence of overflow/ underflow is detected 
by the hardware and a trap-handler is invoked for 
appropriately spilling/ restoring stack registers 
to/from the software stacks. 

Note that the mechanism as described permits 
the use of hardware (or software) that drains and 
fills the register slacks in the background in antici- 
pation of stack overflows and underflows. Stack 
overflows and underflows in the conventional sense 
thus may be avoided by a process we call "regis- 
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ter cleaning" described later. 

Operation of the allocation process is best ex- 
plained through the use of an example. Assume 
that procedure A has BOL pointing to physical 
register 38 and TOL pointing to physical register s 
47. Thus the procedure A has 10 local registers. 
Prior to calling procedure B, procedure B allocates 
4 parameter registers. This sets OTOL to 47, and 
TOL to 51. When the branch and link is executed, 
the base register values are packed into parameter w 
register 0, i.e., physical register 48. (See "Control 
register A" below.) Additionally, BOL is sot to 47. 
By placing the bottom of the called procedure (B) 
local space at OTOL, the parameter registers are 
common and become the bottom part of B's local /5 
space. OTOL and TOL are set to 51. Assume that 
procedure B allocates 10 local registers. This 
changes TOL and OTOL to 60. If procedure B now 
returns, BOL, TOL. and OTOL are reset to their 
initial values of 38, 47, and 51 respectively. 20 

According to one embodiment, the values re- 
turned by procedure B to procedure A may be 
found in physical registers 48-51. Alternatively, re- 
turn values may be placed in the static registers. 
This permits deallocation of the parameter registers 25 
immediately upon return, making them available for 
the next procedure call. 

Note also that one parameter register (here 48) 
is used to store the pointer values when a proce- 
dure is called. More specifically, return information uo 
is stored in a control register (further explained 
below), and the compiler is required to copy it to 
the local register space and restore it to the control 
register before returning. Preferably, the TOL and 
OTOL values themselves are not saved, but in- 35 
stead values are saved which allow these values to 
be computed, nominally an offset to the previous 
value. Using an offset value allows them to be 
stored in any arbitrary register. Thus, the registers 
can be rotated any arbitrary amount and the 40 
mechanism described still works correctly. Another 
alternative embodiment is to allocate an extra pa- 
rameter register for this purpose, so that a net 
number of registers usable for parameter passing 
equals the actual number allocated by the calling 45 
procedure. The allocation and deallocation of rotat- 
ing registers operates in a similar manner. 

Each register stack has a unique software 
stack into which registers are saved at a stack 
overflow, and from which registers are loaded on a so 
stack underflow. Thus each register stack truly 
represents the tip of the appropriate software stack. 

It is convenient in implementing the foregoing 
methods to provide the following control registers: 

Control register A: This packs the various 55 
base pointers for the fixed-point stack - BOV, 
BOL, TOL. BOR, TOR. and OTOL. 



Control register B: This packs the different 
base pointers for the floating-point stack - BOV, 
BOL, TOL, BOR, TOR, and OTOL 
Control register C: This contains the memory 
address of the software stack backing the regis- 
ter stack for the fixed-point registers. 
Control register D: This contains the memory 
address of the software stack backing the regis- 
ter stack for the floating-point registers. Recall 
the appropriate base pointers ^re stored in the 
parameter register 0 preparatory to executing a 
procedure call. 
Turning now to Fig. 2, a logical address space 
model is illustrated for a set of registers numbered 
RO through R127. Static registers 50 (R0 through 
R31) are reserved, for example, for global values, 
and are not involved in the local register allocation 
mechanism. Stack registers R32 through R127 are 
indicated by reference number 58 (not an address). 
The virtual addresses illustrated by the model illus- 
trate the register stack as seen by software proce- 
dures. The virtual addresses (VR) are translated to 
actual or physical register addresses (PR) in order 
to access the physical register files as further ex- 
plained below. Initially, an unallocated address 
space 60 comprises the entire register stack. 

FIGS. 3A through 31 illustrate the virtual ad- 
dress space as seen by a series of called proce- 
dures. The called procedures are designated A.B.C 
and D at the top of each drawing. Note that direc- 
tions "up" and "down" as well as the designations 
"top" and "bottom" in this model are arbitrary. For 
example, one could allocate local registers down- 
ward from the "top", here R127, and wrap around 
when the "bottom" of the stack (VR32) is encoun- 
tered. We choose to illustrate the invention by 
allocating upward from R32. The principles of op- 
eration are the same as long as one is consistent. 

Turning now to Fig. 3A, the logical address 
space of Fig. 2 is shown after a call to a first 
procedure "A". A logical address space (i.e. a 
contiguous series of virtual registers) 62 is allo- 
cated as local to procedure A. The BOL ("Bottom 
of Local") pointer indicates the bottom of A's local 
space, and TOL ("Top of Local") delimits the top of 
the procedure A local address space. BOV ("Bot- 
tom of Valid") is initialized to BOL and delimits 
space that currently is allocated. Reference number 
60 indicates virtual registers (or address space) not 
yet allocated, i.e. space above TOL or below BOV. 
In Fig. 3B, procedure A allocates parameter space 
64 for passing parameters to a subsequently called 
procedure. The parameter space 64 increases the 
local address space allocated to procedure A, as 
indicated by a corresponding upward adjustment of 
the TOL pointer to the top of parameter space 64. 
Pointer OTOL indicates the TOL value prior to 
allocation of parameter registers. 
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Procedure A next calls a procedure B. As 
noted, the pointers (Control register A) are stored 
in the first parameter register. Referring to Fig. 3C, 
procedure B allocates a local (virtual) address 
space that, as always, begins at the bottom of the 
stack (VR33) as delineated by the BOL pointer. A 
first portion of B's local space is mapped to the 
parameter passing registers 64 of procedure A so 
that parameter space 64 is common to procedures 
A and B. The called procedure's local space al- 
ways starts at the bottom of the stack (BOL), and it 
always begins with the calling procedure's param- 
eter space. 

A procedure call thus may be considered as 
"pushing down" the virtual register stack such that 
the parameter passing space (e.g. 64) goes to the 
bottom. The calling procedure's local space (e.g. 
62) is "wrapped around" to the top of the address 
space in Fig. 3C, delimited by adjusting the BOV 
(Bottom of Valid) pointer. Procedure B also al- 
locates additional (purely local) registers 66, and 
bounded by the TOL pointer. As before, the re- 
maining unallocated address space is indicated by 
60. 

FIGS. 4A through 41 model a physical address 
space such as a register file. It is helpful at this 
stage to consider qualitatively the relationship of 
the virtual address space modeled in FIGS. 3A 
through 31 to the physical address space. Referring 
now to FIG. 4A, the BOL and BOV pointers indicate 
the origin of the register file address space, which 
may be for example physical address 0. The pro- 
cedure A virtual address space 62 (FIG. 3A) cor- 
responds to physical address space 102 (FIG. 4A) 
delimited by the TOL pointer. FIG. 48 also shows 
parameter space 104 which is allocated by proce- 
dure A and corresponds to virtual address space 
64 in FIG. 3B. Reference number 100 indicates 
address space not currently allocated in the phys- 
ical address space model. In general, reference 
numbers in FIGS. 3A-3I translate to corresponding 
reference numbers in FIGS. 4A-4I, respectively, by 
adding forty to the former. 

FIG. 4C shows the additional allocation of ad- 
dress space 106 which corresponds to the proce- 
dure B local address space 66 of FIG. 3D by 
adjusting the TOL pointer. Thus, it may be ob- 
served that while a called procedure's virtual ad- 
dress space always begin at the bottom of the 
register stack, there is no corresponding relocation 
of data in the physical register file. Rather, as 
illustrated in FIGS. 4A through 41, additional regis- 
ters are allocated by called procedures as needed 
without affecting the physical address spaces pre- 
viously allocated. Next we refer again to FIG. 3D to 
consider additional procedure calls. 

Procedure B allocates parameter passing ad- 
dress space 70, illustrated in Rg. 3D, by adjusting 



the TOL pointer. The remaining local address 
space 66 including the parameter address space 
64 common to procedure A are not affected. The 
procedure A local address space 62 remains at the 
5 top of the address space illustrated in Fig. 3D, 
delineating by the BOV address pointer. 

Referring to Fig. 3E, procedure B calls yet 
another procedure C. The logical address space for 
procedure C comprises the following. Beginning 
w from the bottom of the logical address space 
(BOL), address space 70 is the parameter passing 
spaco common to procedure B. Procedure C al- 
locates local address space 72 delineated by the 
TOL pointer. The calling procedure (B) local ad- 
/s dress space 64, 66 is "pushed down" and wraps 
around to the top of the model of Fig. 3E. The 
procedure A local space 62 is "pushed down" to 
accommodate the present call, and BOV moved 
accordingly. In other words, the register stack logi- 
20 cally rotates. As always, the remaining unallocated 
address space is indicated by 60. 

FIGS. 4D and 4F illustrate the physical address 
space corresponding to the logical address space 
modeled in FIGS. 3D and 3E respectively. Refer- 
pf> ring to FIG. 4D. parameter passing space 110 
corresponds to the virtual parameter space 70 al- 
located by procedure B in FIG. 3D. Similarly, the 
local address space 112 in FIG. 4E corresponds to 
the virtual local address space 72 allocated by 
30 procedure C in FIG. 3E. 

Referring now to FIG. 3F, procedure C al- 
locates rotating register address space 74 in addi- 
tion to the local space 72 previously allocated. The 
TOL (and TOR - see FIG. 3G) pointer indicates the 
35 top of the rotating register space and BOR in- 
dicates the bottom of the rotating register space. 
Virtual address spaces 62, 64, and 66 are not 
affected. FIG. 4F illustrates the corresponding al- 
location of physical address space 114, bounded 
40 by BOR and TOL. Note that the number of rotating 
registers can be varied according to the char- 
acteristics of software pipelined loops. Only in the 
case a procedure attempts to allocate rotating reg- 
ister space in excess of currently available address 
45 space does register overflow occur. This case is 
described below. 

Next, referring to FIG. 3G, procedure C al- 
locates parameter passing space 76 on top of the 
rotating register space 74, in anticipation of calling 
so another procedure. TOL is adjusted to delimit the 
parameter space. Logical address space 60 re- 
mains unallocated. FIG. 4G illustrates the corre- 
sponding allocation of physical address space 1 16 
for procedure C to pass parameters to another 
55 procedure. 

FIG. 3H illustrates the virtual address space 
model after another procedure D is called by pro- 
cedure C. The parameter space 76 previously al- 
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located by procedure C appears at the bottom of 
the procedure D local address space as usual. 
Additionally, procedure D allocates local address 
space 78 by adjusting the TOL pointer. Address 
space local to the calling procedure, i.e. procedure 
C (except for the common parameter passing 
space 7G). is pushed down and wrapped around to 
the top of the model, as indicated by 70, 72 and 74 
in FIG. 3H. Address spaces 66 and 64 which are 
local to procedure B are pushed down accordingly. 
Similarly, procedure A local space 62 is pushed 
down still further on the stack, and bounded as 
usual by the BOV (Bottom of Valid) pointer, leaving 
address sp3co 60 still unallocated. FIG. 4H illus- 
trates tho corresponding allocation of physical ad- 
dress space 118 for procedure D to use as local 
registers. 

Procedure D next attempts to allocate rotating 
register space in excess of the available address 
space indicated by 60 in Fig 3H. This results in a 
register ovorf Inw rmr\ tmn As a result, a portion of 
the memory spa~i; atwivr: tho BOV pointer is saved 
to memory (not shrwni Tho saved portion includes 
logical address space* 62. 64 and part of 66. The 
I30V pointer move*: u;.» as a result of the overflow 
save operation, thereby freeing up additional 
space. The resulting unallocated address space 60 
is more than adequate tc accommodate procedure 
D's request for rotat nq rcqisters. The result is 
illustrated m Fig 31 whec 80 indicates the proce- 
dure D rotating rcrjistuf space. 

Referring to FIG 31. procedure D has allocated 
rotating register address space 80, delimited by the 
BOR and TOL ponters. In this case, somewhat 
more than the m m mum space necessary was 
saved to memory As a result, an unallocated por- 
tion 60 remain* This arises from arranging the 
overflow save mechanism so as to move a pre- 
determined number of addresses, rather than 
merely the minimum immediately required. The 
number of addresses 'elocated in a save operation 
preferably is selected for efficient implementation 
in the subject hardware The resulting hysteresis 
can reduce the number of momory references nec- 
essary in use. An alternative embodiment would 
save only enough adcrcss space to accommodate 
the pending allocation I he physical address model 
after register overflow and save, and after allocating 
the required rotating registers, is shown in FIG. 41, 
where 120 indicates the procedure D rotating regis- 
ter space. 

Details of register overflow save and restore 
mechanisms are known. However, another aspect 
of the present invention is a "cleaning" mechanism 
lhat works together with the virtual address stack 
register system so as to prevent register overflow 
entirely. A "clean register is defined as a register 
having an accurate copy of its contents currently in 



memory. Conversely, a "dirty" register does not 
have a reliable copy of Its contents in memory. 
Note that a dirty register may well be valid, i.e. 
currently allocated. A clean register space is delin- 

5 ealed by BOC (Bottom of Clean) and TOC (Top of 
Clean) pointers. BOC is essentially the same as 
BOV. Initially, TOC equals BOC as there are no 
clean registers by definition until register contents 
are copied to memory. Register cleaning is done 

70 transparently in background, i.e. by "stealing" oth- 
erwise idle processor cycles. , 

When TOC is less than BOL, some registers 
have not been updated in memory. The register 
cleaning mechanism copies the next register, i.e. 

75 the values at TOC + 1 to memory. Then it incre- 
ments TOC, so that TOC always points to the top 
clean register. In general, the local registers may 
be ignored, as they are likely to be dirty frequently. 
So it is preferred to clean only up to BOL. Note 

20 that the cleaning process is transparent to the 
software and independent of the register allocation 
and deallocation methods and apparatus described. 

Register File Port Access Circuit 

25 

Case One - Static Register Access 

Fig. 5 is a block diagram of a register file port 
access circuit 140 according to the present inven- 

30 tion. The physical registers, e.g. 128 registers, are 
provided in a series of hardware register files such 
as register file 144. An access circuit of the type 
shown in Fig. 5 is provided for each register file 
port. One function of the circuitry is mapping a 

35 logical address R, e.g an address provided by a 
software procedure, to a corresponding physical 
register address r for accessing the register file. In 
circuit 140, a logical register address R is input on 
line 142 and coupled to one of three inputs to a 

40 multiplexer 146. 

A comparator 150 compares the value of R to 
a constant equal to the number of global or static 
registers in a particular application (32 in this ex- 
ample) to determine whether the logical address is 

45 among the static registers (i.e. R < 32). If R is less 
than 32, the indicated address is within the range 
of static registers and the output from comparator 
150 asserts multiplexer control lines 152 so that 
mux 146 selects the value R itself for input to the 

so register file as the physical address. In other 
words, R is not modified for the static registers. As 
noted above, the static registers do not participate 
in the stack register operations. 

55 Case Two - Register Stack Access 

If R is equal to or greater than 32 (and qup or 
qdn is not asserted), R is a valid register stack 
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virtual address, and it must be mapped to a phys- 
ical register address. We assume for the moment 
that no rotating registers are allocated to the cur- 
rent procedure. In this case, the output of a 
modulo-plus adder 154 is selected through MUX 5 
1 46 as Ihe physical address presented to the regis- 
ter file 144. The modulo-plus adder 154 combines 
the logical address R with a local relocation term 
("Irel"), an offset, using modulo arithmetic in order 
to determine the physical register address. The 10 
relocation addition is performed modulo the num- 
ber of registers physically implemented in the 
stack. The local relocation term Irel equals the 
Bottom of Local pointer value (BOL) minus the 
number of fixed registers. Note that Irel is arbitrary; 75 
it is not restricted to any predetermined relocation 
offset amount or block size. Thus, only the exact 
number of registers allocated bya given procedure 
are used. Conversely, exactly the same number of 
registers are deallocated on a return. 20 

To illustrate, assume the total number of hard- 
ware registers is 128 and registers 0-31 are fixed 
registers, so the register stack has 96 registers. 
Next assume a virtual stack address R = 44 and 
BOL =40. Then modulo-plus (BOL-32) equals 44 ?5 
+ 8 modulo 96 = 52. There is no "wrap around" 
from the modulo addition in this example. However, 
if BOL = 90 then R modulo-plus (BOL-32) equals 
(44 + 58 =102) modulo 96. which equals 6, ex- 
cept that the modulo-plus operation "skips over" R 30 
0:31 so the resulting physical register file address 
r = 38. In general, VR i accesses PR j, where P is 
tho total number of stack registers and i and j are 
related by: j = i, if i < 32; j = [(BOL + i - 32) mod 
P] + 32, if i > = 32. 35 

Case Three — Register Restoring and Cleaning 

The register file port access circuit 140 also 
provides access for register restoring and "clean- 4o 
ing". A control signal "qup" indicates a read from 
main memory to restore registers which have been 
overwritten and now must be made valid again in 
the register file. It is used in conjunction with stack 
underflow to provide more valid registers. When 45 
qup is asserted, it controls mux 146 to select QUP 
as the address to access the register file. QUP is 
the address of the next register to be restored; i.e 
BOV-1. 

QUP is the address of the next register to 50 
clean outside of the local space. This is the next 
available register, i.e. one not valid, so the QUP 
address is simply TOL plus 1. The contents of 
main memory are copied into the register file at 
that address, making that register clean by defini- 55 
tion. TOC is then incremented so that it always 
points to the top of clean space. 



A control signal M qdn" indicates cleaning a 
register by copying (writing) its contents to main 
memory. When qdn is .asserted, it controls mux 
146 to select QDN as the address to access the 
register file. QDN is the address of the next regis- 
ter to clean; i.e. TOC plus 1. The register cleaning 
mechanism copies the contents of the register to 
main memory. Then it increments TOC, so that 
TOC always points to the top clean register. 

Note that QUP and QDN are mutually exclu- 
sive. A port has either one or the other, never both. 
QUP is implemented on a store port in tho register 
file and always reads from memory. QDN is on a 
read port in the register file and always wirtos to 
memory. The notation "QUP or QDN" in FIGS. 5 
and 6 is intended to convey this mutual exclusivity 
without proliferating drawing figures. 

Register File Port Access with Rotating Regis- 
ter Implementation 

Turning now to FIG. 6, a register file port 
access circuit 160 is shown in block form. Circuit 
160 of FIG. 6 has certain elements in common with 
circuit 140 of FIG. 5, and like reference numbers 
indicate the common circuit oloments. Description 
of the common features is omitted. FIG. 6 includes 
additional circuit oloments for implementing rotat- 
ing registers within the register stack. As before, 
the logical address R is provided on input node 
142. A comparator 164 compares the logical ad- 
dress R to the BOR (Bottom of Rotating) pointer. 
Another comparator 166 compares R to the TOR 
pointer. If R is above BOR and below TOR, this 
logical address indicates a register allocated to the 
current procedure as a rotating register. 

The physical address r equals R plus some 
rotating relocation term rrel. The rotating relocation 
term rrel equals the local relocation term Irel plus 
the rotating register base value (RRB) to account 
for rotation within the rotating register set, assum- 
ing no wraparound in the rotating registers. Thus: 

r « R + Irel + RRB 

However, if there is wraparound in the rotating 
registers, then: 

r - R + Irel + RRB - (TOR-BOR) 

where TOR minus BOR yields the size of the 
rotating register set. In the access circuit 160, R is 
added to rrel in adder 170 (using modulo-plus 
operation as described above) and the result pro- 
vided to mux 162. The relocation term rrel may be 
precomputed as both Irel and RRB are known in 
advance of R. For the wraparound case, an al- 
ternate relocation term rre!# is added to R in 
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modulo-plus adder 172 and the result provided to 
mux 162, where rrel# equals Irel + RRB-(TOR- 
BOR). Which value to select in mux 162 as r is 
determined as follows. Wraparound in the rotating 
registers occurs when the logical address would- 5 
otherwise exceed the bounds of the rotating regis- 
ters. Thus, the question is: 

R - BOR + RRB > TOR - BOR ? 

10 

From algebra, this test is equivalent to: R > TOR - 
RRB ? This is determined by comparator 168 in 
FIG. 6, as it comparos R to TOR - RRB. Thus, if 
the result is true, comparator 168 controls mux 162 
so as to select the output of modulo-plus adder >s 
172 as the physical address r. If the result is false, 
the rotating registers did not wrap around, so com- 
parator 168 controls mux 162 so as to select the 
output of modulo-plus adder 170 as the physical 
address r 20 

Vnr.ru is circuits may ho devised to accomplish 
tr*r functions of circuits 140 or 160 as may be 
rr.ourod For example, the cleaning address fea- 
tuiui may bo implemented in some applications 
but not others Some applications may not provide 25 
fo* rotating regtstors within the subject stack, in 
w-nci case circuitry like that of FIG. 5 will suffice. 
Others may calculate the rotation offset RRB else- 
where and provide the result to adder 170 as 
ntjooed The particulars of each implementation will 30 
be apparent to those skilled in the art in view of the 
present specification, subject to performance 
tfadeolls. Fast, parallel hardware is suggested, for 
example, in applications where the register file port 
addressing is a critical path. 3S 

FIG. 7 is a hardware block diagram illustrating 
generally one example of a register file system for 
implementing the present invention. A series of 
registers labeled qup/qdn, TOR, BOR, RRB, 
QUP/QDN. Irel, rrel and rrel# are provided for main- 40 
taining the corresponding pointer values. These 
registers are coupled over a bus 176 to provide 
pointer values as needed to remapping circuitry 
such as the rogistui file port access circuit 160. 
described in detail above. One such remapping 45 
circuit is provided for each register file port used in 
a register file 144. Many variations on this general 
arrangement will be apparent to skilled hardware 
designers in view of the purposes and operation 
described above. For example, multiple pointer val- 50 
ues may be compacted within fewer registers. Se- . 
lected intermediate values or addresses may be 
precomputed to optimize performance. Other vari- 
ations such as allocation of various tasks to hard- 
ware versus software (including microcode) are the 55 
subject of design tradeoffs and adaptation to a 
specific implementation, all of which may be con- 
sidered equivalents to the embodiment described. 



Having illustrated and described the principles 
of our invention in a preferred embodiment thereof, 
it should be readily apparent to those skilled in the 
art that the invention can be modified in arrange- 
ment and detail without departing from such princi- 
ples. We claim all modifications coming within the 
spirit and scope of the accompanying claims. 

Claims 

1. In a digital computer having a set of physical 
registers, a method of dynamically allocating 
registers to procedures without compiler inter- 
vention, the method comprising the steps of: 

defining a logical register stack (58) com- 
prising a plurality of stack registers; 

initializing a local relocation term (Irel) so 
as to define an offset for mapping the logical 
register stack into the physical register set 
(FIG.4A); 

allocating to a first procedure (A) an ar- 
bitrary number of stack registers (62) specified 
by the first procedure as local registers by 
initializing a first stack pointer value (TOL) so 
as to delimit the local registers (62) in the 
logical register stack; and 

in connection with a register access opera- 
tion during execution of the first procedure, 
mapping each local register logical address (R, 
FIG. 5) into the physical register sot (r) respon- 
sive to the local relocation term (Irel). 

2. A method according to claim 1 further com- 
prising: 

storing the first stack pointer value (TOL) 
so as to form a second stack pointer value 
(OTOL); 

allocating to the first procedure (A) an ar- 
bitrary number of additional stack registers 
(64) specified by the first procedure as param- 
eter passing registers by incrementing the first 
stack pointer value (TOL) so as to include the 
parameter passing registers; and 

storing selected parameters in the allo- 
cated parameter passing registers (64) for ref- 
erence by a called procedure (B), wherein said 
storing step includes mapping the parameter 
passing registers into the physical register set 
responsive to the local relocation term (FIG. 5). 

3. A method according to claim 2 further com- 
prising: 

calling a second procedure (B); 

allocating to the second procedure an ini- 
tial local register space comprising the first 
procedure parameter passing registers (64) 
thereby making the selected parameters stored 
in said registers available to the second proce- 
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dure without a memory reference; 

allocating to the second procedure an ar- 
bitrary number of additional stack registers 
specified by the second procedure as local 
registers (66) by incrementing the stack point- 5 
er value (TOL) so as to include the second 
procedure local registers (66) without first sav- 
ing the first procedure's local registers' con- 
tents to memory; and 

upon returning from the second procedure, 10 
deallocating the local registers by decremen- 
ting the stack pointer value (TOL) by the num- 
ber of local registers (66), thereby calling and 
returning from the second procedure without 
saving and restoring local register contents. is 

4. A method according to claim 3 further com- 
prising: 

upon calling the second procedure, storing 
the first and second stack pointer values 20 
(TOL.OTOL) to form stored values (Control 
register A) for reference upon a return from the 
second procedure; and wherein said deallocat- 
ing step includes resetting the first and second 
stack pointer values to the said stored values. 25 

5. A method according to claim 2.3 or 4 further 
comprising allocating an extra parameter regis- 
ter to the first procedure and wherein said 
storing step includes storing stack pointer off- 30 
set values in said extra parameter register for 
reference upon return from the second proce- 
dure. 

6. A method according to claim 2.3 or 4 further 35 
comprising initializing a bottom of local (BOL) 
pointer value to indicate one end of the stack 
registers allocated to the current procedure, 

the other end of the stack registers allocated to 
the current procedure being indicated by the aq 
said first stack pointer value (TOL); and 
wherein the local relocation term (Irei) equals 
the bottom of local pointer value less a pre- 
determined constant number of static registers. 

45 

7. A method according to cldim 2,3 or 4 further 
comprising: 

initializing a bottom of valid pointer (BOV) 
for indicating a depth of a software stack ac- 
cessible through the register stack; and so 

wherein said incrementing the first stack 
pointer value (TOL) is conducted using modulo 
addition, modulo the number of physical regis- 
ters, so that the register set is managed as a 
ring; and further comprising 55 

indicating a register overflow condition 
when said incrementing the first stack pointer 
would result in a value greater than the bottom 



of valid pointer (BOV) value. 

8. A method according to claim I further com- 
prising: 

initializing a first rotating register pointer 
value (BOR) and a second rotating register 
pointer value (TOR) to the first stack pointer 
value (TOL); 

allocating registers to a cailed procedure 
(C, FIG. 3F) as rotating registers (74) by incre- 
menting the second rotating register pointer 
value (TOR) and the first stack pointer value 
(TOL) by an arbitrary number of registers 
specified by the called procedure as rotating 
registers; and 

prior to returning from the called proce- 
dure, deallocating the rotating registers by de- 
crementing the second rotating register pointer 
value (TOR) and the first stack pointer value 
(TOL) by number of rotating registers. 

9. A register file port access apparatus (140) for 
providing a physical address to access a regis- 
ter file port to implement the methodology of 
claim 1, the apparatus comprising: 

input moans (142) for receiving a virtual 
address (H) from a current procedure; 

comparator means (150) for comparing the 
virtual address (R) to a predetermined constant 
(32) to determine whether the virtual address 
indicates a static register or a stack register; 

means (154) for adding the virtual address 
to a local relocation term (Irel) to form a first 
physical address; 

multiplexer means (146) for selecting one 
of the virtual address (R) and the first physical 
address and coupling the selected address (r) 
to the register file port; and 

control means (152) coupled to the mul- 
tiplexer means so as to select the first physical 
address if the virtual address (R) indicates a 
stack register and to select the virtual address 
if the virtual address indicates a static register, 
thereby redirecting stack register references to 
physical register addresses allocated to the 
current procedure. 

10. A register file port access apparatus according 
to claim 9 and further comprising: 

comparator means (164,166) for compar- 
ing the virtual address (R) to first and second 
rotating register pointer values (BOR, TOR) to 
determine whether the virtual address indicates 
a register allocated to the current procedure as 
a rotating register; 

means (170) for adding the virtual address 
(R) to a first rotating relocation term (rrel) to 
form a first physical address; 
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means (172) for adding the virtual address 
(R) to a second rotating relocation term (rrel#) 
to form a second physical address; 

multiplexer means (162) for selecting one 
of the first and second physical addresses and 5 
coupling the selected address (r) to the regis- 
ter file port address terminal (144); 

control means (168,152) for controlling the 
multiplexer means (162) so as to . select the 
first physical address if the virtual address (R) w 
does not imply wraparound within the rotating 
register set and to select the second physical 
address if the virtual address does imply 
wraparound within the rotating registers; 
wherein 15 

the first rotating relocation term (rrel) 
equals the local relocation term (Ire!) plus the 
rotating register base value (RRB), and the 
second rotating relocation term (rrel#) equals 
the local relocation term (Irel) plus the rotating 20 
register base value (RRB) less the size of the 
rotating register set, thereby adjusting for the 
said wraparound within the rotating register 
set. 

25 
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